Make Groovy Parser correctly handle nested parenthesis #4718

jevanlingen · 2024-11-26T09:33:01Z

What's changed?

Nested parentheses around method invocations are supported as well for the groovy parser.

What's your motivation?

Checklist

I've added unit tests to cover both positive and negative cases
I've read and applied the recipe conventions and best practices
I've used the IntelliJ IDEA auto-formatter on affected files

rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyParserVisitor.java

rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyParserParentheseDiscoverer.java

rewrite-groovy/src/test/java/org/openrewrite/groovy/GroovyParserParentheseDiscovererTest.java

…ssing

rewrite-groovy/src/test/java/org/openrewrite/groovy/tree/MethodInvocationTest.java

rewrite-java/src/main/java/org/openrewrite/java/tree/Space.java

Laurens-W

LGTM but since it's an extensive change let's let Tim, Sam or Knut take a look too!

rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyPrinter.java

rewrite-groovy/src/test/java/org/openrewrite/groovy/tree/RangeTest.java

rewrite-groovy/src/test/java/org/openrewrite/groovy/tree/CastTest.java

rewrite-groovy/src/test/java/org/openrewrite/groovy/GroovyParserParentheseDiscovererTest.java

rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyParserVisitor.java

timtebeek · 2024-12-05T13:09:51Z

rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyParserParentheseDiscoverer.java

+    public static @Nullable Integer getInsideParenthesesLevel(MethodCallExpression node, String source, int cursor) {
+        String sourceFromCursor = source.substring(cursor);


I'm somewhat surprised to see source and cursor as separate arguments here when they are merely used to do this substring. Would it make sense to change the method signature here to

Suggested change

public static @Nullable Integer getInsideParenthesesLevel(MethodCallExpression node, String source, int cursor) {

String sourceFromCursor = source.substring(cursor);

public static @Nullable Integer getInsideParenthesesLevel(MethodCallExpression node, String sourceFromCursor) {

I had that before, but reverted it back to current state. I think it more clear if the GroovyParserParentheseDiscoverer is a completely separate entity. So no former logic is needed to determine the wrapping. Just pass everything you have (thus the node, the source code of the entire file and the current place you are parsing right now) and let the function discover it all.

Having the cursor argument also prevents from misinterpretation to function, because you need to provide the starting point (that is the cursor) of parsing.

I wonder if this can be simplified. For Python I had the same basic problem to solve, but I didn't have any "parentheses level" from the AST or similar. Additionally, in Python the parentheses for tuples and generator expressions are sometimes optional, which doesn't make things simpler.

Anyway, rather than counting the "parentheses levels" up front, I added something like a maybeParenthesized(node, delegate) method which does something like this:

Save cursor and use whitespace() to parse whitespace (and comments) at cursor into a Space

Check if the next character is a (

If not, reset cursor to saved cursor and return result from delegate and we are done

Otherwise push a lambda on a parentheses stack, which is a field on the parser visitor

Call maybeParenthesized() recursively

Again, save cursor, call whitespace() and check if we are at a )

If yes, pop lambda from parentheses stack and apply it to result, which will wrap it in a J.Parentheses

If not, reset cursor and return result

For a language like Java or Python, where we can expect the source files to be reasonably large, I would definitely do something like this, as the substringing and regexp matching won't be free in terms of performance. But for Groovy that would probably not be a big concern, as scripts tend to be small and few.

I will let you think a bit about this and don't want to force you into a different solution.

For reference, here is the Python code in question. It isn't pretty, so in case you decide to give this approach a try, I would only look at the Python code if you get stuck, as you will probably find a cleaner solution for your parser: https://github.com/openrewrite/rewrite-python/blob/b42101961bbc9daf21d54fe803dd9eaec59c3184/rewrite/rewrite/python/_parser_visitor.py#L1950-L2025

I second Knut's comments and like the approach he describes. I worry that these regexes will be slow and error prone.

The one tricky situation I can think of are Java-style type casts, as the first token there is also a (. So if the AST node is a Java-style type cast (assuming they can be distinguished from Groovy-style type casts in the AST), the logic could for example pop that last element from the stack and reset the cursor before calling the corresponding visit method (or do it in the method itself). There are also other alternative solutions.

I agree, your implementation seems like a better idea @knutwannheden. I would rather first merge this and then do a second round with improvements (or better said, go for the new/better implementation). Currently we don't support nested parenthesis, which just breaks scripts that use them. Having this sub-ideal implementation makes them at least runnable, though with a performance loss.

Then I can start working on the better implementation

--

Edit: Talked with @knutwannheden about this, we concluded it's better to not go forward yet:

If we aren't in a hurry to get this done, it will just make the Git history messier to go through, when someone tries to find out why some change was made in the past.

It isn't quite as easy as I made it out to be and I now remember the struggles I had with Python here. The difficulty arises because we don't know at what AST level the parentheses are. Here that parentheses level may come in handy.

rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyParserParentheseDiscoverer.java

…erParentheseDiscoverer.java Co-authored-by: Tim te Beek <[email protected]>

sambsnyd · 2024-12-05T14:24:39Z

rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyParserParentheseDiscoverer.java

+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+class GroovyParserParentheseDiscoverer {


I haven't reviewed in detail but at first glance it doesn't seem to be aware of comments so any parentheses in comments might throw it off

shanman190 · 2024-12-05T14:46:27Z

This is just my 2c, but the several regexes makes me feel like the solution here isn't quite right. For trivial source files found in the test cases, it works. But I fear that for actual source files that this would bring G-based parsing to a crawl.

…-nested-parenthesis # Conflicts: # rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyParserVisitor.java # rewrite-groovy/src/test/java/org/openrewrite/groovy/tree/MethodInvocationTest.java # rewrite-java/src/main/java/org/openrewrite/java/tree/Space.java

…-nested-parenthesis

timtebeek · 2024-12-19T16:56:18Z

Fixed in Make Groovy Parser correctly handle nested parenthesis #4801

jevanlingen linked an issue Nov 26, 2024 that may be closed by this pull request

Groovy parser does not correctly handle nested parenthesis #4703

Closed

jevanlingen force-pushed the 4703-groovy-parser-does-not-correctly-handle-nested-parenthesis branch from 8579420 to 378fe5b Compare November 26, 2024 09:37

jevanlingen commented Nov 26, 2024

View reviewed changes

rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyParserVisitor.java Show resolved Hide resolved

timtebeek assigned jevanlingen Nov 26, 2024

timtebeek added parser-groovy bug Something isn't working labels Nov 26, 2024

jevanlingen changed the title ~~Skip ending parentheses when parsing visitMethodCallExpression~~ Make Groovy Parser correctly handle nested parenthesis Nov 26, 2024

timtebeek mentioned this pull request Nov 26, 2024

Also test with Groovy 2.5 on classpath for Gradle 4, 5 & 6 #4520

Closed

jevanlingen removed their assignment Nov 26, 2024

This comment was marked as outdated.

Sign in to view

jevanlingen force-pushed the 4703-groovy-parser-does-not-correctly-handle-nested-parenthesis branch from bc3b03b to fe98f02 Compare November 29, 2024 12:49

This comment was marked as outdated.

Sign in to view

jevanlingen force-pushed the 4703-groovy-parser-does-not-correctly-handle-nested-parenthesis branch 2 times, most recently from 5c22da6 to fe378e2 Compare December 3, 2024 11:09

github-actions bot reviewed Dec 4, 2024

View reviewed changes

rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyParserParentheseDiscoverer.java Show resolved Hide resolved

rewrite-groovy/src/test/java/org/openrewrite/groovy/GroovyParserParentheseDiscovererTest.java Show resolved Hide resolved

jevanlingen self-assigned this Dec 4, 2024

jevanlingen added 11 commits December 5, 2024 09:39

Skip ending parentheses when parsing visitMethodCallExpression

b04247c

Stop duplicate code in method

4191337

Introduce visitAndGetAnnotations method

8963f1e

Rename ctor to contructor_

c352e7c

Add marker imports

09aa9c3

Add ClosingParenthese marker to add extra parentheses

0de94eb

Add ClosingParenthese marker to add extra parentheses

2b638a8

Improve space printer

d7afb0c

Calculate _INSIDE_PARENTHESES_LEVEL manually if metadata flag is mi…

8376b4e

…ssing

improvement

da61c8d

improvement

d292e34

Laurens-W reviewed Dec 5, 2024

View reviewed changes

rewrite-groovy/src/test/java/org/openrewrite/groovy/tree/MethodInvocationTest.java Show resolved Hide resolved

Laurens-W reviewed Dec 5, 2024

View reviewed changes

rewrite-java/src/main/java/org/openrewrite/java/tree/Space.java Outdated Show resolved Hide resolved

Laurens-W approved these changes Dec 5, 2024

View reviewed changes

Add test from issue #4615

560a2a8